所有物种基因Symbol别名转换为最新Symbol |
您所在的位置:网站首页 › geneid转换为gene symbol › 所有物种基因Symbol别名转换为最新Symbol |
需求
在数据分析中会经常出现感兴趣的基因不在矩阵中,可能的原因是没有测到和旧版Symbol。因此需要找到旧版Symbol(Alias别名)和最新Symbol(Current Symbol)之间的对应关系。 bq.tl.current_symbol可以把(表达)矩阵中的Symbol变为最新版 第一个参数数据框(index为Symbol) 第二个参数Symbol与Alias对应关系文件路径 第三个参数物种tax_id比如人的是9606。SymbolAlias_20230317.feather的获取可以发送邮件到[email protected] 从NCBI下载最新的基因信息https://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz import numpy as npimport pandas as pdimport bioquest as bq 得到Symbol与Alias对应关系 g=pd.read_csv("gene_info_20230317.gz",sep='\t',usecols=['#tax_id','GeneID','Symbol','Synonyms'])g.rename(columns={"#tax_id":"tax_id"},inplace=True)g.loc[:,"Alias"]=g.Synonyms.str.split('|')g = g.explode("Alias")g = bq.tl.select(g,columns=["tax_id","GeneID","Symbol","Alias"])g.reset_index(drop=True,inplace=True)g.replace({'Alias': {'-':''}},inplace=True)g.to_feather("SymbolAlias_20230317.feather",compression='zstd',compression_level=1) tax_id GeneID Symbol Alias 0 7 5692769 NEWENTRY 1 9 2827857 NEWENTRY 2 11 10823747 NEWENTRY 3 14 6951813 NEWENTRY 4 19 3758873 NEWENTRY ... ... ... ... ... 44205723 3032134 60460443 ND6 44205724 3032134 60460444 ND1 44205725 3032134 60460445 I9997_mgr02 44205726 3032134 60460446 I9997_mgt22 44205727 3032134 60460447 I9997_mgr01 [44205728 rows x 4 columns] 使用示例 示例数据 df = pd.read_csv("BLCA.csv",index_col="Gene Symbol")# Gene Name Species# Gene Symbol # ATP2B1 ATPase, Ca++ transporting, plasma membrane 1 Homo sapiens# MYL6 myosin, light chain 6, alkali, smooth muscle a... Homo sapiens# RPS16 ribosomal protein S16 Homo sapiens# HIST1H2BA histone cluster 1, H2ba Homo sapiens# H2AFY2 H2A histone family, member Y2 Homo sapiens# ... ... ...# UBB ubiquitin B Homo sapiens# PYGB phosphorylase, glycogen; brain Homo sapiens# HLA-A major histocompatibility complex, class I, A Homo sapiens# HSPA1A heat shock 70kDa protein 1A Homo sapiens# HSP90AB1 heat shock protein 90kDa alpha (cytosolic), cl... Homo sapiens 转换 bq.tl.current_symbol(frame=df,reference="SymbolAlias_20230317.feather", tax_id=9606)# Gene Name Species \# H2BC1 histone cluster 1, H2ba Homo sapiens # MACROH2A2 H2A histone family, member Y2 Homo sapiens # H3-3B H3 histone, family 3B (H3.3B) Homo sapiens # H1-5 histone cluster 1, H1b Homo sapiens # DARS1 aspartyl-tRNA synthetase Homo sapiens # ... ... ... # UBB ubiquitin B Homo sapiens # PYGB phosphorylase, glycogen; brain Homo sapiens # HLA-A major histocompatibility complex, class I, A Homo sapiens # HSPA1A heat shock 70kDa protein 1A Homo sapiens # HSP90AB1 heat shock protein 90kDa alpha (cytosolic), cl... Homo sapiens# Alias # H2BC1 HIST1H2BA # MACROH2A2 H2AFY2 # H3-3B H3F3B # H1-5 HIST1H1B # DARS1 DARS # ... ... # UBB NaN # PYGB NaN # HLA-A NaN # HSPA1A NaN # HSP90AB1 NaN # [378 rows x 3 columns] |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |